Automated Classification of Pulmonary Nodules through a Retrospective Analysis of Conventional CT and Two-phase PET Images in Patients Undergoing Biopsy

Objective(s): Positron emission tomography/computed tomography (PET/CT) examination is commonly used for the evaluation of pulmonary nodules since it provides both anatomical and functional information. However, given the dependence of this evaluation on physician’s subjective judgment, the results could be variable. The purpose of this study was to develop an automated scheme for the classification of pulmonary nodules using early and delayed phase PET/CT and conventional CT images. Methods: We analysed 36 early and delayed phase PET/CT images in patients who underwent both PET/CT scan and lung biopsy, following bronchoscopy. In addition, conventional CT images at maximal inspiration were analysed. The images consisted of 18 malignant and 18 benign nodules. For the classification scheme, 25 types of shape and functional features were first calculated from the images. The random forest algorithm, which is a machine learning technique, was used for classification. Results: The evaluation of the characteristic features and classification accuracy was accomplished using collected images. There was a significant difference between the characteristic features of benign and malignant nodules with regard to standardised uptake value and texture. In terms of classification performance, 94.4% of the malignant nodules were identified correctly assuming that 72.2% of the benign nodules were diagnosed accurately. The accuracy rate of benign nodule detection by means of CT plus two-phase PET images was 44.4% and 11.1% higher than those obtained by CT images alone and CT plus early phase PET images, respectively. Conclusion: Based on the findings, the proposed method may be useful to improve the accuracy of malignancy analysis.


Introduction
Lung cancer is a leading cause of cancer mortality among men and women. This disease is a serious public health problem in many countries (1). When a lung nodule is found during cancer screening, it is important to accurately classify the lesion as benign or malignant in order to institute appropriate therapy and improve the survival rate.
Computed tomography (CT) is often used for lung cancer screening (2). According to the results of a national lung screening trial carried out in the United States (3), screening with low-dose CT scans reduced lung cancer mortality by 20%. Therefore, CT is regarded as a suitable diagnostic tool for the early detection of lung cancer. If a suspicious lesion is found by CT examination, positron emission tomography (PET)/CT examination is performed for detailed analysis. In this combined technique, PET images provide functional information while CT images render anatomical information, thereby facilitating a comprehensive analysis of the malignancy of nodules.
However, fluorodeoxyglucose (FDG) PET images of benign nodules, such as those associated with inflammatory diseases, often exhibit high uptake values similar to the images of malignant nodules; furthermore, their anatomical structures are similar. Therefore, it is often difficult to differentiate between benign and malignant nodules (4). In such cases, a bronchoscopic biopsy is performed; however, the procedure is invasive, and the patient faces great physical hardship.
In these cases, if the CT and PET images can be analysed in detail to quantify the degree of malignancy of the nodules, the need for excessive biopsy with its accompanying physical hardship can be reduced. With his background in mind, the present study was focused on the automated analysis of the malignant potential of the pulmonary nodule using PET/CT images.
Many studies have investigated the benign/ malignant differentiation of pulmonary nodules by image analysis (5)(6)(7)(8)(9)(10). For instance, Armato et al. proposed the automated analysis of pulmonary nodule using linear discriminant analysis with characteristic features obtained from CT images (5). The evaluation of 470 CT scans in a study revealed that the area under the receiver operating characteristic (ROC) curve was 0.79. In addition, Shen et al. (9) introduced a deep learning model of the multi-crop convolutional neural network to classify pulmonary nodules. The authors used 880 benign nodules and 495 malignant nodules from the lung image database consortium and image database resource initiative (LIDC/IDRI) dataset and obtained an accuracy of 87.14% with the model. Nie et al. developed a semi-automated scheme for distinguishing between benign and malignant pulmonary nodules by integrating PET and CT information. They evaluated three computeraided diagnosis schemes based on an artificial neural network to distinguish between benign and pulmonary nodules using clinical information and image features. They reported that the combined use of PET and CT rendered a higher diagnostic accuracy, compared to the employment of CT alone or PET alone (10).
However, to the best of our knowledge, the automated calculation of the characteristic values of pulmonary nodules based on PET and CT images have not been developed. The automated classification of pulmonary nodules can have a great practical value. Regarding this, the present study involved the proposition of an automated classification scheme of the pulmonary nodule using CT and PET images. The major objective of our study was to develop the characteristic features using both CT and PET images. Furthermore, we also developed a classification method using random forest, which is a kind of ensemble machine learning technique.
In this paper, first, the architecture of the developed classification method is described. In addition, the effectiveness of the classification of pulmonary nodules as evaluated with the original CT and PET image database is discussed.

Methods
This study was approved by the Institutional Review Board. Informed consent was obtained from all patients under the condition that all data were anonymized (No. HM17-002). The current study was conducted on 36 early and delayed phase PET/CT images obtained from patients with a suspected diagnosis of lung cancer. In addition, conventional CT images at maximal inspiration were analysed. The cases were chosen from those whose differential diagnosis was difficult with diagnostic imaging alone, and final diagnosis was made by bronchoscopy and biopsy analysis.
The PET/CT imaging studies were performed by means of Siemens True Point mCT (Siemens). Both images were obtained with a matrix size of 200×200 pixels (voxel size: 4.07×4.07×2.00 mm 3 , scan time: 2.0 min/table) with free breathing. Image reconstruction was performed using the 3D-OSEM reconstruction algorithm.
In addition, point-spread function, time-of- flight correction (PSF+TOF), and attenuation correction were performed by CT images. The PET images were converted to the voxel size of the CT image after reconstruction. Early and delayed PET imaging was performed 60 min and 120 min after the administration of 3.7 MBq/kg of FDG, respectively. These PET and CT images were aligned automatically by the PET/CT scanner.
The conventional CT imaging was performed using Aquilion ONE (Toshiba, Tokyo, Japan) with a matrix size of 512×512 pixels (voxel size: 0.625×0.625×0.500 mm 3 ) with the lung kernel. In case the CT examination was carried out more than once, the images taken with the shortest interval from the PET/CT examination were selected.
Out of a total of 18 benign cases, 13 cases were finally diagnosed by biopsy, and the remaining 5 cases were confirmed to be benign by a follow-up examination of at least 3 years. Out of 25 malignant cases, 1, 4, and 13 cases were small cell carcinoma, squamous cell carcinomas and adenocarcinomas respectively. The mean ages of the patients in the malignant and benign nodule groups were 72.2±7.6 and 65.3±10.2 years, respectively. Figure 1 depicts an overview of the proposed method. In this method, regions designated as suspicious in the PET and CT images by the doctor were analysed using several characteristic features, and then automatically classified as benign or malignant.

Volume of Interest (VOI) extraction
The position and diameter of the nodule to be analysed using conventional CT and PET/CT images was specified by the physician. Accordingly, the segmentation of the volume of interest (VOI) around the pulmonary nodule was carried out on the CT and PET images for analysis. The centre coordinates of the VOIs extracted from the conventional CT and PET/CT images were manually set while checking the MPR images of the CT images.
First, the trans-axial image with the largest nodule area was located, and its centre coordinates were specified manually. Then, the longest diameter in the image was set as diameter, D xy , in the x-y direction (trans-axial plane) of the nodule. Subsequently, while changing the slice position in the direction of the body axis, the range of the slice in which the nodule was present was obtained and set as D z . The VOI was extracted using the number of pixels on three sides, namely 2D xy , 2D xy , and 2D z , from the original image. Table 1 shows the checkpoints for distinguishing between benign and malignant nodules (11,12). The physicians created this scheme by referring to the pixel values, such as the uptake value of the PET images and CT values. Furthermore, consideration was given to such factors as nodule components (e.g., ground glass opacity [GGO] or solid), shapes (roundness), clarity of the nodule border, and spiculas. In this study, these points were quantified as characteristic features.

Characteristic features (i) Pixel intensities of PET and CT images
Many malignant nodules have a high pixel intensity in PET and CT images. Therefore, standardised uptake value (SUV) (13) of early and delayed PET images was defined as ESUV and DSUV, respectively. Furthermore, the difference in SUV between the delayed and early phases was defined as ΔSUV. In the measurement of SUV, two methods were introduced, namely SUV max (hottest voxel) and SUV peak (maximum average SUV within a 1 cm 3 spherical volume). In the CT images, CT value at the centre of the nodule (CT centre ) and the maximum CT value inside the nodule (CT max ) were calculated.

(ii) Shape
With regard to the shape of the nodules, malignant nodules often have a ball-like shape, while the benign ones have a line-like shape. To evaluate the ball-like and line-like shapes, a method using a Hessian matrix was proposed (14). The Hessian matrix was obtained by taking the second order differential of the three-dimensional image as follows: (1) Then, three eigenvalues (λ 1 , e 2 , e 3 ) were obtained from the matrix. Finally, the ball-like and line-like features (L mass and L line ) were calculated using the eigenvalues as follows: (2) (3)

(iii) Contrast of the nodule border
The border of a malignant nodule is often unclear. Therefore, the contrast of the border was evaluated using the difference between the CT values of the outer and inner borderlines of the nodules. In order to calculate this value, the average CT values at the pixels belonging to the inner edge R1 (CT R1 ) and the peripheral region R2 (CT R2 ) were obtained, and the difference between the two values, |CT R1 -CT R2 |, was defined as the contrast, C b (Figure 2).
To obtain R1 and R2, the image was first binarized, and the contour was extracted by the Sobel operator. The set of pixels on the outline was defined as R2. Subsequently, the binarized region was shrunk by a morphological operation (erosion) with a structural element having a radius of 1 pixel, and the contour of the reduced region was extracted in the same manner as described above; a set of these pixels was used as R1.

(iv) Spicula
The presence of a spicula around the nodule increases the possibility of the nodule malignancy. In this study, spicula in CT images was detected using Gabor filter (15,16). The use of Gabor filter facilitates the visualization of line patterns and their orientations (Figure 3b and 3c). Radial line patterns were extracted from the two images (Figure 3d), and the number of radial components and their ratios were calculated as the features of spiculas, SP 1 and SP 2 .

(v) Texture features
The texture pattern of the lung lesion is important for the evaluation of malignancy. Out of the several ways to analyse textures, a method that is proposed by Haralick et al. based on the grey level co-occurrence matrix (GLCM) was employed in the current study (17). The matrix element P(i,j) of GLCM is the set of second order statistical probability values for changes between grey levels of i and j at a particular distance, d, and angle, θ. In this regard, θ represents the counterclockwise angle with respect to the X axis.
The GLCM can assess such properties as texture uniformity, directionality, and contrast based on the distribution of the values of the matrix elements. Haralick et al. proposed 14 kinds of characteristic features using GLCM. In our study, it was necessary to limit the number of characteristic features as the number of the analysed cases was small. To obtain the texture features in each direction, the following five types of features were calculated using θ of 0° (T 1_0 -T 5_0 ) and 90° (T 1_90 -T 5_90 ) in the trans-axial plane of the CT images: where,

Classification
Identification of benign or malignant nodules was accomplished by means of the obtained characteristic features. In this study, classification was performed using the random forest algorithm (18). Random forest is an ensemble learning method for classification and regression that operates by constructing multiple decision trees and outputting the class that is the mode of the classes of the individual trees. Practically, the input for the random forest was the 25 characteristic values, while the output is a judgment result regarding the benignity or malignancy of the nodule. In this study, the maximum number of trees was set at 20. In order to analyse the distinguishing characteristics of this method, three kinds of classification methods were evaluated. These methods included: 1) classification based on CT images alone, 2) classification based on CT images and early phase PET images, and 3) classification based on CT images, as well as early and delayed phase PET images.

Characteristic features
In order to confirm whether individual feature values are useful for distinguishing between benign and malignant nodules, the mean, median, and standard deviation of the values in the two groups were calculated.
The effect sizes (19) were also calculated. Furthermore, t-values and p-values were also calculated using the t-test (double-sided test). The results are shown in table 2.
ESUV max : maximum standardised uptake value of early PET images, DSUV max: maximum standardised uptake value of delayed PET images, ΔSUV max: difference between ESUV max and DSUV max , ESUV peak : peak standardised uptake value of early PET images, DSUV peak : peak standardised uptake value of delayed PET images, ΔSUV peak : difference between ESUV peak and DSUV peak , CT centre : CT value at the centre of the nodule, CT max : maximum CT value inside the nodule, D xy and D z : nodule diameters in X-Y plane (trans-axial) and z-direction, L mass : ball-like feature, L line : line-like feature, C b : CT R1 -CT R2 , SP 1 and SP 2 : features of spicula, T 1_0~T5_0 : texture features for horizontal direction, T 1_90~T5_90 : texture features for vertical direction There was a significant difference between the characteristic features of benign and malignant nodules with regard to SUV. The PET examination resulted in the most significant SUV that was confirmed to be useful for distinguishing between benignity and malignancy. In addition, some texture features showed significant differences between benign and malignant nodules and were effective for distinguishing between these two states. However, difference in features with regard to spicula was of low significance. Furthermore, the average CT max in the benign nodules was higher than that in the malignant nodules.

Results of classification scheme
The nodule classification scheme was evaluated using the receiver operating characteristic (ROC) curve. In the curve, the true positive rate was defined as the ratio of the number of corrected malignant nodules to the total number of malignant nodules. On the other hand, false positive rate was defined as the ratio of the number of misclassified benign nodules to the total number of benign nodules. Furthermore, performance was evaluated by the leave-one-out cross-validation method.
To analyse the distinguishing characteristics of our technique, three methods were evaluated. These methods included: 1) classification based on CT images alone, 2) classification based on CT images and early phase PET images, and 3) classification based on CT images, as well as early and delayed phase PET images. Figure 4 displays the ROC curves of each of the above methods. The area under the curves (AUC) of methods 1, 2, 3 were 0.730, 0.860, and 0.895, respectively.
Considering the accuracy rate of malignant nodules (0.944), the accuracy rates of benign nodules for methods 1, 2, and 3 were obtained as 0.277, 0.611, and 0.722, respectively. The CT and PET images in trans-axial plane with the accuracy rates of 0.722 and 0.944 for the benign and malignant nodules, respectively are shown in Figure 5. There were significant differences between ROC curves 1 and 2, 2 and 3, as well as 1 and 3 (P=0.032, P=0.104, and P=0.021,  respectively).

Discussion
The findings of the present study revealed a significant difference between the characteristic features of benign and malignant nodules with regard to SUV ( Table 2). The SUV obtained by PET examination was confirmed to be useful for distinguishing between benignity and malignancy. However, given the fact that inflammatory diseases can also result in high SUVs, these values should be combined with other features for correct classification. In addition, some texture features were effective for distinguishing between benign and malignant nodules.
Difference in features with regard to spicula was of low significance. This is because many spiculas are observed even in inflammatory diseases. Furthermore, the average CT max in the benign nodules was higher than that in the malignant ones. This is due to the presence of calcification inside the benign nodules. The overlap of the characteristic features between two classes makes it difficult to distinguish between them using a single feature. Therefore, the integration of multiple characteristic features using the classifier would be more effective.
In the ROC curves (Figure 4), the proposed method based on the combined use of CT and twophase PET images showed an incorrect detection rate of 0.278 for the benign nodules (accuracy rate: 0.722), whereas the accuracy rate of detecting a malignant nodule was 0.944. Target nodules in this study were "difficult nodules" to differentiate based on CT and PET/CT images. Most of the benign cases were not confirmed in follow-up examinations; however, they were confirmed after biopsy. Our results indicated that biopsy examination with its accompanying physical hardship to the patient, especially in benign cases, could be reduced by 72.2%.
As shown in Figure 4, the use of PET plus CT images improves the AUC of the ROC curves, compared to the employment of only CT images. This denotes the effectiveness of using both anatomical and functional information together. Based on the ROC curves, there was a significant difference between the analysis using CT alone and the one using both CT and PET images. In terms of the analysis that was based on PET plus CT images, no significant difference was found between the analysis made based on early phase alone and the one conducted using the early and delayed phases together. However, false-positive rate at the high true positive rate is very important in malignancy analysis. Based on the abovementioned results, analysis using two-phase images was still advantageous.
Even with the use of this method, there were some nodules that were misclassified. One of the reasons for this is that the CT and PET images did not show the characteristics that were peculiar to benign or malignant lesions. The future challenge is to introduce the novel characteristic features that reflect benignity and malignancy. Moreover, the further development of this method requires the improvement of the classifiers' performance. Therefore, it is necessary to compare the performance of this method with classifiers, such as linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), support vector machine (SVM), and artificial neural network (ANN).
There are many reports indicating that machine-learning performances are remarkably improved by using deep learning (20,21). In earlier studies, we applied the deep learning technique for automated nodule detection and classification of lung cancer types (22,23). In the future, we intend to apply the deep learning technique for the classification of pulmonary nodules.

Conclusion
In this study, we have developed a machine learning-based analysis of pulmonary nodules using early and delayed phase PET and conventional CT images in patients undergoing biopsy. As a result, 94.4% of the malignant nodules were identified correctly assuming that 72.2% of the benign nodules were judged correctly. These results indicate that the proposed method may be useful to improve the accuracy of malignancy analysis.